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Abstract 

The evolution of ants is marked by remarkable adaptations that allowed the development of very complex social systems. 
To identify how ant-specific adaptations are associated with patterns of molecular evolution, we searched for signs of 
positive selection on ammo-acid changes in proteins. We identified 24 functional categories of genes which were enriched 
for positively selected genes in the ant lineage. We also reanalyzed genome-wide data sets in bees and flies with the same 
methodology to check whether positive selection was specific to ants or also present in other insects. Notably, genes 
implicated in immunity were enriched for positively selected genes in the three lineages, ruling out the hypothesis that 
the evolution of hygienic behaviors in social insects caused a major relaxation of selective pressure on immune genes. Our 
scan also indicated that genes implicated in neurogenesis and olfaction started to undergo increased positive selection 
before the evolution of sociality in Hymenoptera. Finally, the comparison between these three lineages allowed us to 
pinpoint molecular evolution patterns that were specific to the ant lineage. In particular, there was ant-specific recurrent 
positive selection on genes with mitochondrial functions, suggesting that mitochondrial activity was improved during the 
evolution of this lineage. This might have been an important step toward the evolution of extreme lifespan that is a 
hallmark of ants. 

Key words: comparative genomics, sociality, d N /ds, aging, lifespan, immunity, neurogenesis, olfactory receptors, metab- 
olism, Hymenoptera, bees, Drosophila. 



Introduction 

Ants constitute an extremely successful lineage of animals 
which has colonized virtually all ecosystems on Earth 
(Holldobler and Wilson 1990). The pivotal feature at the 
basis of this ecological success is their highly social system 
with a reproductive division of labor, where one or a few 
queens specialize in reproduction, whereas workers conduct 
all the colony tasks such as brood care, nest maintenance, and 
food collection. In this article, we take advantage of the recent 
availability of seven sequenced ant genomes (Bonasio et al. 
2010; Nygaard et al. 2011; Smith, Zimin, et al. 2011; Smith, 
Smith, et al. 2011; Suen et al. 2011; Wurm et al. 2011) to 
perform a genome-wide scan for positive selection on 
amino-acid changes in protein coding genes during the evo- 
lution of the ant lineage. We addressed three main questions. 

First, we compared the amount of positive selection in 
functional categories of genes. Previous large-scale scans for 
positive selection in animals indicated that positive selection 
predominantly affects certain types of genes, such as those 
involved in evolutionary arms races, sexual selection, or con- 
flicts with pathogens (Bakewell et al. 2007; Drosophila 12 



Genomes Consortium 2007; Kosiol et al. 2008; Vamathevan 
et al. 2008; Oliver et al. 2010; George et al. 201 1; Woodard et al. 
2011). Such genes experienced positive selection events re- 
currently on broad evolutionary time scales, and it is likely 
that they contribute to a fraction of the positive selection 
events that occurred in the ant lineage. To identify these 
genes, we reasoned that they likely also were under positive 
selection in other insect lineages. A systematic comparison of 
the targets of positive selection from published studies in 
insects is not straightforward because genome-wide scans 
for positive selection were often performed with different 
methods in different lineages. For example, a positive selec- 
tion scan on 12 Drosophila species (all solitary) (Drosophila 12 
Genomes Consortium 2007) used the site test of Codeml 
(Yang et al. 2000), which is aimed at detecting recurrent pos- 
itive selection events affecting particular sites of a protein, 
whereas a scan on 10 bee species (including solitary, primi- 
tively social, and highly social species) (Woodard et al. 2011) 
used the branch test (Yang 1998), which tends to detect 
positive selection events affecting a large number of sites of 
a protein but during a limited period of time. To perform a 
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robust comparison of the genes that were under positive 
selection in ants and other insects, we conducted similar 
scans for positive selection in ants and the flies and bees 
outgroups. An example of genes expected to be repeatedly 
under positive selection in insects are genes involved in de- 
fense and immunity (Drosophila 12 Genomes Consortium 
2007; Bulmer 2010). On the basis of the observed smaller 
set of immunity genes in the honeybee compared with 
Drosophila melanogaster, it has been suggested that selective 
pressure on these genes might have been relaxed in social 
insects, perhaps because they have social hygienic behaviors 
(Honeybee Genome Sequencing Consortium 2006; Smith 
et al. 2008; Viljakainen et al. 2009; Smith, Zimin, et al. 2011; 
Suen et al. 2011; Harpur and Zayed 2013). However, the ad- 
dition of several newly sequenced insect genomes revealed 
that the important gene complement in fruit fly is a derived 
character (Werren et al. 2010; Fischman et al. 2011; Smith, 
Smith, et al. 2011). We used our data sets to test whether 
there was evidence for weaker positive selection on immunity 
genes in ants and bees compared with flies. 

Next, we aimed at detecting sets of genes involved in func- 
tions likely to reflect ant-specific adaptations. We focused on 
three main adaptations. The first relates to the wide range of 
coordinated collective behaviors associated with division of 
labor in ant societies. Complex cooperative behaviors occur 
among nestmates for tasks such as communal nest construc- 
tion and defense, brood rearing, social hygienic behavior, and 
collective foraging (Holldobler and Wilson 1990). It has been 
suggested that the evolution of social interactions may be 
tracked down to molecular changes affecting nervous 
system development and function. In particular it may trans- 
late into increased rates of positive selection on nervous 
system-related genes, as documented in primitively social lin- 
eages of bees, which evolved social behaviors independently 
from ants (Fischman et al. 2011; Woodard et al. 2011). 
Complex collective behaviors also require efficient communi- 
cation systems that are essentially mediated by chemical sig- 
naling in social insects. Ants identify nestmates from non- 
nestmates, as well as ants from other species, through their 
scent. Individuals also use various types of pheromones as 
alarm signals and to mark their trails and territories. It has 
therefore been suggested that genes involved in chemical 
signaling, notably pheromone production and perception, 
should experience increased positive selection in ants com- 
pared with solitary insects (Ingram et al. 2005; Robertson and 
Wanner 2006; Bonasio et al. 2010; Smith, Zimin, et al. 2011; 
Wurm et al. 2011; Kulmuni et al. 2013; Leboeuf et al. 2013). 
A manually curated data set of 873 olfactory receptor genes 
(ORs) allowed us to conduct a test for increased positive 
selection on these genes in ants. 

The second type of potential molecular adaptation relates 
to phenotypic plasticity among castes. Although queens and 
workers usually develop from totipotent eggs (Schwander 
et al. 2010), they display dramatic morphological and physi- 
ological differences. Queens are often larger, have wings, and 
have much more highly developed ovaries than workers that 
often are sterile and lack a sperm storage organ (Holldobler 
and Wilson 1990). In most species, the differences between 



castes result from developmental differences induced by en- 
vironmental factors rather than genetic differences (Abouheif 
and Wray 2002; Schwander et al. 2010; Penick et al. 2012; 
Rajakumar et al. 2012). We therefore investigated whether 
there was evidence for increased positive selection in genes 
and pathways potentially involved in developmental plasticity 
(Smith et al. 2008; Fischman et al. 2011). 

A third and interesting type of ant-specific adaptation re- 
lates to the extremely long lifespan of ant queens, which can 
live more than 20 years in some species (Keller and Genoud 
1997; Jemielity et al. 2005). This corresponds to a 100-fold 
increase in lifespan compared with solitary insects. The vari- 
ation in lifespan among castes is also remarkable, with queens 
living up to 10 times longer than workers and 500 times 
longer than males. So far, a limited number of molecular 
candidates have been identified to explain this pattern, 
mainly inspired from work in Drosophila (Jemielity et al. 
2005; Keller and Jemielity 2006). We therefore investigated 
whether there was evidence of positive selection on genes 
that have previously been associated with aging in model 
organisms. It is possible that positive selection acted on the 
same sets of genes in the bee lineage, where queens also live 
longer than other castes and than solitary insects, but such a 
signal should not be observed in short-lived species of the 
Drosophila lineage. To further assess the link between positive 
selection and aging, we investigated whether genes that ex- 
perienced positive selection in ants were genes shown in D. 
melanogaster to be differentially expressed between old and 
young individuals, and between oxygen -stressed and control 
individuals (Landis et al. 2004). 

Finally, we investigated whether there was a difference in 
the level of positive selection between genes showing biased 
expression in queens, workers, and males. The efficiency of 
natural selection acting on an advantageous mutation — and 
thus the probability of its long-term fixation — is proportional 
to its effect on fitness (Duret 2008). The fitness effects of 
mutations in genes that are expressed only in non reproduc- 
tive workers are indirect, so everything else being equal, se- 
lection should be less efficient at fixing them than mutations 
on genes expressed in queens and males. This could translate 
into lower levels of positive selection on genes expressed 
specifically in workers compared with males and queens 
(Linksvayer and Wade 2009; Hall and Goodisman 2012). We 
therefore analyzed previously published microarray data from 
the red fire ant Solenopsis invicta (Ometto et al. 2011), and 
compared the amount of positive selection between groups 
of genes varying in the level of caste-biased expression. 

Results 

Pervasive Positive Selection Detected in Ants 
To detect positive selection episodes that acted on protein- 
coding genes during the evolution of the ant lineage, the 
branch-site test of Codeml was run on 4,261 protein align- 
ments of single-copy orthologs composed of four to seven ant 
and three to five outgroup species (see Materials and 
Methods). All branches that led to ant species in each gene 
family tree (including 2 hymenopteran and 13 ant branches; 
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Fig. 1. Phylogeny of the seven sequenced ant species and the five outgroups used in this study. The maximum-likelihood phylogeny was computed by 
R. Waterhouse from the concatenated alignment of the conserved protein sequences of 2,756 single-copy orthologs from OrthoDB (Simola et al. 2013). 
The scale bar indicates the average number of amino acid substitutions per site. The phylogeny is consistent with a previously published study (Brady 
et al. 2006). A second study only found a difference in the branching of Pogonomyrmex barbatus and Solenopsis invicta (Moreau et al. 2006). The 15 
different branches where positive selection was tested are highlighted in red (the seven terminal branches leading to ant species and the branches 
numbered #1 to #8). The percentage of gene families showing positive selection in each of these branches at FDR =10% is displayed in table 2. 
Illustrations of the seven ant species and Apis mellifera are courtesy of Alexander Wild at http://www.alexanderwild.com (last accessed April 24, 2014). 
Pediculus humanus illustration was downloaded from Vectorbase, Drosophiia melanogaster, Tribolium castaneum, and Nasonia vitripennis illustrations 
were downloaded from Wikipedia. Illustrations are not to scale. 



fig. 1) were successively tested for the presence of episodic 
positive selection. As many as 1,832 single-copy orthologs 
families (43%) displayed a signal of positive selection (at 
10% false discovery rate [FDR]) in at least one of the branches 
tested (supplementary table S1, Supplementary Material 
online). In 91% of the significant alignments, at least one 
residue targeted by positive selection could be identified 
with a posterior probability greater than 0.9 (Bayes 
Empirical Bayes test; fig. 2) (Yang et al. 2005). There was ev- 
idence for positive selection in at least one branch of the ant 
lineage for 830 (20%) of the genes analyzed. For 74% of them 
positive selection was specific to ants and not observed in the 
basal hymenopteran branches #7 and #8 (fig. 1). The 10 gene 
families with the most significant test values in the ant lineage 
are given in table 1. 

The proportion of positively selected genes varied signifi- 
cantly across the different branches tested (% 2 test, P < 1e-15; 
table 2), similarly to previous analyses with experimental and 
simulated data sets. This likely results at least in part from 
lower power of the branch-site test in shorter branches 
(Anisimova and Yang 2007; Kosiol et al. 2008; Studer et al. 



2008; Fletcher and Yang 2010; George et al. 2011; Gharib and 
Robinson-Rechavi 2013). Consistent with this view, there was 
a significant correlation in our data set between the length of 
tested branches and the test score (log-likelihood ratio; 
Spearman correlation p = 0.41, P< 1e-15). Additional analy- 
ses ruled out the hypothesis that false positives caused by 
convergence problems of the test, selective constraints acting 
on synonymous sites, saturation of synonymous substitution 
rate d s , or sequencing errors could be responsible for this 
pattern (supplementary text, Supplementary Material 
online). 

Taken together, these results demonstrate that positive 
selection was common in the evolution of the ant genes. 
The proportion of significant genes was similar in magnitude 
in the outgroup data set of 10 bees analyzed with the same 
methodology (20%; supplementary table S2, Supplementary 
Material online), but even higher in the outgroup data set of 
12 flies (36%; supplementary table S3, Supplementary 
Material online). This difference might reflect biological dif- 
ferences between the lineages, such as effective population 
size N E , but also differences in the topology and branch 
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Table 2. Amount of Positive Selection Detected on Different Branches of the Analyzed Phylogeny. 


Ri*an/*h Mam©** 
Dlctlll.ll INdlllC 


LI Hedge U/clll IcdlcU 


Pt*3/~f"irtn r»f Prtcifiwolv 
rlaUHJII Ul 1 UilLIVciy 

Selected Gene Families 


Miimhpr r»f Pr»cif"i\/<=>l\/ 
INUIIIUcl Ul 1 UblLIVcly 

Selected Gene Families' 3 


Acep 


Atta cephalotes 


0.056 


144 


Aech 


Acromyrmex echinatior 


0.043 


109 


Sinv 


Solenopsis invicta 


0.029 


85 


Pbar 


Pogonomyrmex barbatus 


0.038 


80 


Cflo 


Camponotus floridae 


0.017 


65 


Lhum 


Linepithema humile 


0.036 


97 


Hsal 


Harpegnathos saltator 


0.020 


76 


1 


Attini 


0.0088 


16 


2 


Myrmicinae 


0.0071 


10 


3 


Myrmicinae 


0.0087 


16 


4 


Formicoid 


0.0072 


17 


5 


Formicoid 


0.025 


58 


6 


Formicidae 


0.030 


87 


7 


Aculeata 


0.10 


176 


8 


Hymenoptera/Apocrita 


0.39 


762 


All above branches except 7 and 8 


Formicidae 


0.20 


830 


All above branches 


Hymenoptera/Apocrita 


0.43 


1,832 



a As illustrated in figure 1. 

b Branches of gene families trees can be merged if genes are missing (or removed for quality reasons), and the resulting branches do not correspond to canonical branches defined 
by the species topology (fig. 1). When positive selection is found on such branches, it was not counted in branch-specific numbers displayed in table 2, but it was counted when 
a whole lineage was considered (e.g., Hymenoptera). 



lengths of the species trees, which influence the power to 
detect positive selection events in protein alignments (see 
supplementary text, Supplementary Material online). 

To compare the amount of positive selection experienced 
by different functional categories of genes, we classified genes 
based on their Gene Ontology (GO) annotation in D. mela- 
nogaster orthologs, and performed a gene set enrichment test 
using for each gene family a score reflecting the overall oc- 
currence of positive selection in the ant lineage (Materials and 
Methods; supplementary text, Supplementary Material 
online). Such an approach of grouping genes enables a 
more sensitive search for positive selection, while buffering 
the impact of potential false positives (e.g., from remaining 
alignment errors or GC-biased gene conversion events which 
are difficult to distinguish from real positive selection signals; 
see supplementary text and supplementary tables S20-S24, 
Supplementary Material online). Twenty-four functional cat- 
egories of genes were significantly enriched for positively se- 
lected genes in the ant lineage (at 20% FDR; table 3). A large 
number of them (11 out of 24) were related to mitochondria 
and mitochondrial activity. The other significant categories 
were related to nervous system development, behavior, im- 
munity, protein translation and degradation, cell membrane, 
and receptor activity. Thus, positive selection apparently tar- 
geted a diverse array of gene functions during the evolution of 
the ant lineage. 

Usual Targets of Positive Selection in Insects 
To identify GO categories that experienced positive selection 
not only in ants but also in other insects, we reanalyzed the fly 
and bee data sets with the same methodology used for the 



ant data set. These analyses revealed 106 GO categories sig- 
nificantly enriched for flies and 38 for bees (tables 4 and 5; 
supplementary tables S4 and S5, Supplementary Material 
online). We investigated which categories were enriched for 
positively selected genes in the three lineages. The first group 
of genes commonly enriched in ants, flies, and bees was re- 
lated to proteolysis. This group included 4 of the 24 signifi- 
cantly enriched GO categories in ants ("proteolysis," 
"metallopeptidase activity," "peptidase regulator activity," 
and "hydro-lyase activity"), 8 of the 106 GO categories 
enriched in flies ("serine-type endopeptidase activity," "endo- 
peptidase activity," "proteolysis," "metalloendopeptidase ac- 
tivity," "peptidase activity," "peptidase activity, acting on 
L-amino acid peptides," "metallopeptidase activity," and "exo- 
peptidase activity"), and 6 of the 38 GO categories enriched 
bees ("amine metabolic process," "metalloendopeptidase 
activity," "metallopeptidase activity," "signal transduction," 
"cellular amine metabolic process," and "cellular amino acid 
metabolic process"). 

The second group of genes enriched for positive selection 
signal in ants, flies, and bees was involved in response to 
stimuli. There was an enrichment of the GO category "recep- 
tor activity" in the three lineages as well as the GO categories 
"transmembrane receptor activity" and "olfactory receptor 
activity" in flies. This class of genes plays a pivotal role in 
the interactions between individuals and their environment. 
In addition, the GO categories "response to biotic stimulus" 
and "response to other organism" were enriched in flies, and 
the GO category "response to hormone stimulus" was en- 
riched in bees. In ants, "response to ecdysone" and "response 
to steroid hormone stimulus" were marginally significant 
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Table 3. GO Categories Enriched for Positively Selected Genes, Based on Scores from the Branch-Site Test from Codeml in Ants. 



SetID 


Onfnlnov 


Qpf~Klamp 

JvTLI \di 1 IC 






P- value 


FDR 


VJUiUUUvJ ■ J 


cc 


OKP'anpllni" fi hfuninp 

V7I JjCll ICIICll 1 IUU3UIMC 


59 


26.8 


1.4e-10 


o 


GO:0006120 


BP 


Mitochondrial electron transport, NADH to ubiquinone 


18 


10.6 


1.1 e-9 


0 


G&0005759 


cc 


MifnrhnnHrial mafriy 

f VlllVS^.1 1 vS 1 IUI ICtl 1 1 ICtll 1 A 


98 


39.7 


1.6e-9 


o 


GO0005762 


CC 


MifnrhnnHrial lai*pp rihnsnmal ^nhiinif 

1 VII LUVI IUI IUI Idl ICllgW 1 IL/Ujvl 1 Idl 3 14 LS 14 1 III 


36 


16.8 


1.1 e-7 


0.0025 


C1O*000 < »746 


CC 


/\Aifn<~hnnHi*i;il t*pcnit*al"ni*v rhain 

IVIIIASI.I IVJI IUI ICtl 1 CS|sll CtLlSI y ^.1 ICtll 1 


31 


14.6 


4.5e-7 


0.0033 


C1O*000 < »747 


CC 


/\Aifn<~hnnHi*inl t*pcnit*al"ni*v rhain rnmnlpy 1 

1 V1IHJI.I IvJI IUI ICll 1 COJ-/II CtLUI y LI ICtll 1 LUI 1 1 J-/IVTA 1 


22 


11.0 


1.3e-6 


0.0033 


nooooai^ 


MF 


NADH Hphx/Hfocrpnacip ^iiHinninnnp^ arfix/ifv 
iNrtu/n uci lyui ugci ictoc ^uuiuum iui icy ctLLiviLy 


16 


8.0 


3.2e~5 


0.013 


GO*000^763 


cc 


MifnrhnnHt* ial small rihn^nmal suhnnif 

f VIILUv.1 IUI IUI ICtl 311 ICtll 1 1 UUjUI 1 ICtl J 14 L/ 14 1 ML 


25 


10.9 


0 00018 

V/ivV/v 1 O 


0.047 


GO*0008038 


BP 


Npnrnn rprnonif inn 


19 


8.7 


0 00023 


0.047 


GO-0008344 


BP 


AH ii If ln<*nmnf"ni"\/ Kphjivini* 

MUUIl IwLUI 1 iwlwi y U\Zl Id V IUI 


19 


8.4 


0 00082 


0.086 


GO*0042254 


BP 


Ribnsnmp hinirpnp^K 

IMUUJUIIIC 1/IUgWl IVJlJ 


39 


15.0 


0.001 1 


0.099 


GO:0003735 


MF 


Structural constituent of ribosome 


107 


36.4 


0.0012 


0.099 


GO0044459 


cc 


Plasma mpmhnnp nart" 

1 ICI3I 1 let 1 1 Iwl 1 IUI Ctl Iw LPCtl I 


129 


42.9 


0.0016 


0.12 


GO:0006508 


BP 


Proteolysis 


145 


47.4 


0.0022 


0.14 


GO:0006412 


BP 


Translation 


191 


61.0 


0.0025 


0.15 


GO:00 16491 


MF 


Oxidoreductase activity 


127 


41.8 


0.0028 


0.15 


GO:0004872 


MF 


Receptor activity 


90 


30.6 


0.0028 


0.15 


GO:0055114 


BP 


Oxidation-reduction process 


129 


42.2 


0.0038 


0.16 




/VAC 

/VI r 


Metallopeptidase activity 


36 


13.6 


0.0039 


0.16 


OU:UUo I 1 34 


/VI r 


Peptidase regulator activity 


1 "7 


"7 "t 


n nnx/c 
U.UU4o 


0.18 


GO:0002520 


BP 


Immune system development 


26 


10.2 


0.0053 


0.19 


GO:0048534 


BP 


Hemopoietic or lymphoid organ development 


26 


10.2 


0.0053 


0.19 


GO:0016616 


MF 


Oxidoreductase activity, acting on the CH-OH group of donors, 
NAD, or NADP as acceptor 


18 


7.5 


0.0053 


0.19 


GO:0016836 


MF 


Hydro-lyase activity 


14 


6.1 


0.0055 


0.19 



Note. — The enrichment test considers a combined score for all analyzed branches of the ant lineage (Materials and Methods). The full table of results is shown in supplementary 
table S6, Supplementary Material online. 



(FDR = 21%; supplementary table S6, Supplementary Material 
online). 

Some functions were enriched for positively selected genes 
in only two of the three lineages. These included GO catego- 
ries related to immunity that were enriched in ants and flies, 
and some categories related to metabolism which were en- 
riched in flies and bees. Evidence for positive selection on 
immunity-related functions in ants came from a significant 
enrichment of the GO categories "immune system develop- 
ment" and "hemopoietic or lymphoid organ development," 
the organ that produces during larval development the cells 
mediating the immune response in insects (Corley and Lavine 
2006). Seven GO categories related to immunity were also 
enriched in flies ("defense response," "immune system pro- 
cess," "regulation of antimicrobial humoral response," "regu- 
lation of immune effector process," "regulation of defense 
response," "immune response," and "antimicrobial humoral 
response"). The absence of significant enrichment for related 
categories in bees might reflect a lack of power of the gene set 
enrichment test, because the set of immunity genes is small in 
the honeybee (Honeybee Genome Sequencing Consortium 
2006) and the data set analyzed was further depleted in genes 
with immunity-related functions (supplementary text and 
table S7, Supplementary Material online). Consistent with 
this interpretation, there was a trend in the direction of an 



enrichment, although nonsignificant, for 5 of the 7 tested GO 
categories related to immunity in bees (data not shown), 
suggesting that immunity might be a common target of pos- 
itive selection in insects. 

The second set of GO categories enriched in two of the 
three insect lineages included various metabolic processes 
and their regulators, with metabolism of chitin, aminoglycan, 
carbohydrate, polysaccharide, glucose, hexose, glycerolipid, 
and phosphatidylinositol being enriched in flies and metab- 
olism of lipid, amino-acid, nucleotide, and phosphorus being 
enriched in bees. There was no significant enrichment for GO 
categories related to metabolism in ants, but some categories 
were close to significance (e.g., "chitin metabolic process" and 
"rRNA metabolic process," with FDR = 21% and 24%, respec- 
tively; supplementary table S6, Supplementary Material 
online). Metabolic functions, such as amino-acid, fatty acid, 
lipid, or RNA metabolism, were significantly enriched in ants 
when we used KEGG pathways annotation instead of the GO 
to perform the gene set enrichment test, as well as when the 
single-copy orthologs data set was reanalyzed with another 
multiple alignment method and a different quality filtering 
method (supplementary tables S25 and S26 and supplemen- 
tary text, Supplementary Material online). It thus seems that 
metabolism is a common target of positive selection in 
insects. 
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Table 4. GO Categories Enriched for Positively Selected Genes, Based on Scores from the Branch-Site Test from Codeml in Drosophila. 



SetID 


Ontology 


SetName 


SetSize 


Score 


P-value 


FDR 


GO:0006030 


BP 


Chitin metabolic process 


29 


16.2 


4.0e-6 


0.0015 


GO:0006022 


BP 


Aminoglycan metabolic process 


36 


19.4 


5.7e~6 


0.0018 


GO:0006952 


BP 


Defense response 


36 


19.2 


9.8e~6 


0.0018 


GO:0008061 


MF 


Chitin binding 


24 


13.5 


2.0e-5 


0.0020 


GO:0004252 


MF 


Serine-type endopeptidase activity 


52 


26.0 


3.3e-5 


0.0023 


GO:0008026 


MF 


ATP-dependent helicase activity 


18 


10.5 


4.0e-5 


0.0026 


GO:0004872 


MF 


Receptor activity 


13 


7.9 


8.5e-5 


0.0048 


GO:0006006 


BP 


Glucose metabolic process 


13 


7.8 


0.00021 


0.0082 


GO:0046486 


BP 


Glycerolipid metabolic process 


17 


9.7 


0.00023 


0.0090 


GO:0005819 


CC 


Spindle 


20 


10.9 


0.00046 


0.012 


GO:0004175 


MF 


Endopeptidase activity 


78 


35.9 


0.00048 


0.012 


GO:0009607 


BP 


Response to biotic stimulus 


31 


15.8 


0.00060 


0.013 


GO:0051707 


BP 


Response to other organism 


31 


15.8 


0.00060 


0.013 


GO:0006508 


BP 


Proteolysis 


136 


59.5 


0.00071 


0.014 


GO:0006007 


BP 


Glucose catabolic process 


12 


7.0 


0.00072 


0.014 


GO:0019320 


BP 


Hexose catabolic process 


12 


7.0 


0.00072 


0.014 


GO:0030312 


CC 


External encapsulating structure 


12 


7.0 


0.00074 


0.014 


GO:0015081 


MF 


Sodium ion transmembrane transporter activity 


16 


8.9 


0.00088 


0.016 


GO:0051649 


BP 


Establishment of localization in cell 


14 


7.9 


0.00093 


0.017 


GO:0007126 


BP 


Meiosis 


34 


16.9 


0.00096 


0.017 


GO:0003824 


MF 


Catalytic activity 


844 


337.0 


0.0011 


0.018 


GO:0046488 


BP 


Phosphatidyl inositol metabolic process 


11 


6.5 


0.0012 


0.018 


GO:0016490 


MF 


Structural constituent of peritrophic membrane 


11 


6.4 


0.0013 


0.018 


GO:0005975 


BP 


Carbohydrate metabolic process 


64 


29.5 


0.0015 


0.019 


GO:0004888 


MF 


Transmembrane receptor activity 


49 


23.2 


0.0016 


0.020 


GO:0051276 


BP 


Chromosome organization 


59 


27.2 


0.0020 


0.024 


GO:0008270 


MF 


Zinc ion binding 


173 


73.4 


0.0027 


0.029 


GO:0002376 


BP 


Immune system process 


43 


20.4 


0.0027 


0.029 


GO:0002759 


BP 


Regulation of antimicrobial humoral response 


11 


6.3 


0.0028 


0.029 


GO:0004984 


MF 


Olfactory receptor activity 


19 


9.9 


0.0030 


0.029 


GO:0002697 


BP 


Regulation of immune effector process 


11 


6.2 


0.0038 


0.035 


GO:0000819 


BP 


Sister chromatid segregation 


11 


6.2 


0.0038 


0.035 


GO:0007143 


BP 


Female meiosis 


11 


6.2 


0.0047 


0.040 


GO:0016021 


CC 


Integral to membrane 


224 


93.1 


0.0047 


0.040 


GO:0031347 


BP 


Regulation of defense response 


12 


6.6 


0.0055 


0.044 


GO:0015370 


MF 


Solutersodium symporter activity 


12 


6.6 


0.0061 


0.047 


GO:0000272 


BP 


Polysaccharide catabolic process 


11 


6.1 


0.0065 


0.049 


GO:0016810 


MF 


Hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds 


28 


13.6 


0.0065 


0.049 


GO:0004521 


MF 


Endoribonuclease activity 


11 


6.1 


0.0069 


0.052 


GO:0007291 


BP 


Sperm individualization 


14 


7.4 


0.0074 


0.053 


GO:0010564 


BP 


Regulation of cell cycle process 


30 


14.4 


0.0075 


0.053 


GO:0005635 


CC 


Nuclear envelope 


17 


8.7 


0.0077 


0.054 


GO:0016773 


MF 


Phosphotransferase activity, alcohol group as acceptor 


82 


36.0 


0.0077 


0.054 


GO:0051253 


BP 


Negative regulation of RNA metabolic process 


35 


16.5 


0.0080 


0.055 


GO:0007608 


BP 


Sensory perception of smell 


21 


10.5 


0.0080 


0.055 


GO:0004222 


MF 


Metal loendopeptidase activity 


26 


12.7 


0.0082 


0.055 


GO:0006807 


BP 


Nitrogen compound metabolic process 


298 


121.5 


0.0090 


0.059 


GO:0005576 


CC 


Extracellular region 


97 


41.9 


0.010 


0.066 


GO:0006814 


BP 


Sodium ion transport 


19 


9.5 


0.011 


0.068 


GO:0045132 


BP 


Meiotic chromosome segregation 


11 


5.9 


0.011 


0.070 


GO:0034641 


BP 


Cellular nitrogen compound metabolic process 


296 


120.4 


0.012 


0.072 


GO:0010629 


BP 


Negative regulation of gene expression 


42 


19.2 


0.013 


0.073 


GO:0090304 


BP 


Nucleic acid metabolic process 


162 


67.6 


0.013 


0.075 


GO:0016301 


MF 


Kinase activity 


91 


39.2 


0.013 


0.075 



(continued) 
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Table 4. Continued 



SetID 


Ontology 


SetName 


SetSize 


Score 


P-value 


FDR 


GO:0048584 


BP 


Positive regulation of response to stimulus 


11 


5.9 


0.015 


0.084 


GO:00 16798 


MF 


Hydrolase activity, acting on glycosyl bonds 


26 


12.4 


0.017 


0.086 


GO:0006139 


BP 


Nucleobase, nucleoside, nucleotide, and nucleic acid metabolic process 


236 


96.4 


0.017 


0.086 


GO:00 16491 


MF 


Oxi do reductase activity 


201 


82.6 


0.018 


0.090 


GO:0009987 


BP 


Cellular process 


790 


310.5 


0.019 


0.095 


GO:0007088 


BP 


Regulation of mitosis 


16 


8.0 


0.019 


0.095 


GO:0051783 


BP 


Regulation of nuclear division 


16 


8.0 


0.019 


0.095 


GO:0006810 


BP 


Transport 


200 


82.1 


0.021 


0.10 


GO:0051234 


BP 


Establishment of localization 


197 


80.9 


0.021 


0.10 


GO:0006066 


BP 


Alcohol metabolic process 


35 


16.1 


0.021 


0.10 


GO:0004553 


MF 


Hydrolase activity, hydrolyzing O-glycosyl compounds 


22 


10.6 


0.022 


0.11 


GO:0008233 


MF 


Peptidase activity 


26 


12.3 


0.022 


0.11 


GO:007001 1 


MF 


Peptidase activity, acting on L-amino acid peptides 


22 


10.6 


0.022 


0.11 


GO:0046914 


MF 


Transition metal ion binding 


58 


25.5 


0.022 


0.11 


GO:0050660 


MF 


Flavin adenine dinucleotide binding 


16 


8.0 


0.024 


0.11 


GO:0045892 


BP 


Negative regulation of transcription, DNA-dependent 


26 


12.2 


0.024 


0.11 


GO:0032553 


MF 


Ribonucleotide binding 


161 


66.5 


0.024 


0.11 


GO:0032555 


MF 


Purine ribonucleotide binding 


161 


66.5 


0.024 


0.11 


GO:0035639 


MF 


Purine ribonucleoside triphosphate binding 


161 


66.5 


0.024 


0.11 


GO:0006396 


BP 


RNA processing 


36 


16.4 


0.025 


0.11 


GO:0031226 


cc 


Intrinsic to plasma membrane 


30 


13.9 


0.026 


0.11 


GO:0035222 


BP 


Wing disc pattern formation 


11 


5.7 


0.026 


0.12 


GO:0007346 


BP 


Regulation of mitotic cell cycle 


31 


14.3 


0.028 


0.12 


GO:0045017 


BP 


Glycerolipid biosynthetic process 


11 


5.7 


0.030 


0.13 




DD 

br 


Immune response 


30 


13.8 


0.030 


0.13 


OU:UU44ZoZ 


DD 

br 


Cellular carbohydrate metabolic process 


A A 

44 


19.6 


0.030 


0.13 


vjU:uu l /U/o 


/vac 
/VI r 


Purine nucleotide binding 


1 CA 

lo4 


r-j r 
O/.D 


U.loZ 


n 1 a 
U. 1 4 


yj\J.\j\j lo/Ub 


/VAC 

/VI r 


Oxidoreductase activity, acting on paired donors, with incorporation 
or reduction of molecular oxygen 


T 1 

Z I 


inn 
IU.U 


U.U3Z 


n 1 a 
U. 1 4 


GO:0005524 


MF 


ATP binding 


163 


67.0 


0.034 


0.14 


GO:0030554 


MF 


Adenyl nucleotide binding 


163 


67.0 


0.034 


0.14 


GO:0032559 


MF 


Adenyl ribonucleotide binding 


163 


67.0 


0.034 


0.14 


GO:0008237 


MF 


Metal lopeptidase activity 


12 


6.1 


0.034 


0.14 


GO:0007127 


BP 


Meiosis I 


18 


8.7 


0.035 


0.14 


GO:0019730 


BP 


Antimicrobial humoral response 


14 


6.9 


0.039 


0.15 


GO:0005815 


CC 


Microtubule organizing center 


16 


7.8 


0.041 


0.16 


GO:0055114 


BP 


Oxidation— reduction process 


167 


68.3 


0.043 


0.17 


GO:0019899 


MF 


Enzyme binding 


14 


6.9 


0.044 


0.17 


GO:0048232 


BP 


Male gamete generation 


45 


19.8 


0.045 


0.17 


GO:0008033 


BP 


tRNA processing 


17 


8.2 


0.045 


0.17 


GO:0005887 


CC 


Integral to plasma membrane 


29 


13.2 


0.046 


0.17 


GO:0044281 


BP 


Small molecule metabolic process 


212 


85.8 


0.046 


0.17 


GO:0008238 


MF 


Exopeptidase activity 


18 


8.6 


0.046 


0.17 


GO:0051179 


BP 


Localization 


236 


95.1 


0.047 


0.17 


GO:0007283 


BP 


Spermatogenesis 


44 


19.3 


0.047 


0.18 


GO:0050662 


MF 


Coenzyme binding 


48 


21.0 


0.048 


0.18 


yjVj.VVoHH/V 


Dl 


ncRNA processing 


Z/ 


IZo 


n r\AQ 

U.UHzf 


U. 1 o 




Dl 


Spermatid differentiation 


~iA 


i i . i 


U.UDU 


nio 
U. 1 o 


vjU:UU4b/oo 


DD 

br 


Negative regulation of cell cycle 


1 1 


5.5 


0.050 


0.18 


CCi-ClClACQ'iA 

\J\J.\i\JHOJOH 


RP 

Dl 


INcgdUVc rcgUlaUUFI Ul nuClcUUcoc, FlUCIcUdlUc, FlUCIcULlUc, dnu 

nucleic acid metabolic process 


41 


io 1 
1 o. 1 


n n^? 

U.UDZ 


n iq 

U. 1 o 


GO:0010639 


BP 


Negative regulation of organelle organization 


12 


5.9 


0.055 


0.19 


GO:0015631 


MF 


Tubulin binding 


12 


5.9 


0.056 


0.19 


GO:0005549 


MF 


Odorant binding 


43 


18.8 


0.058 


0.20 



Note. — Depletion results are shown in supplementary table S4, Supplementary Material online. 
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Table 5. GO Categories Enriched for Positively Selected Genes, Based on Scores from the Branch-Site Test from Codeml in Bees. 



SetID 


Ontology 


SetName 


SetSize 


Score 


P-value 


FDR 


GO:0005099 


MF 


Ras GTPase activator activity 


11 


6.0 


1.1 e-5 


0.03 


GO:0005083 


MF 


Small GTPase regulator activity 


18 


8.1 


0.00010 


0.041 


GO:0004872 


MF 


Receptor activity 


16 


7.2 


0.00020 


0.041 


GO:0022836 


MF 


Gated channel activity 


11 


5.4 


0.00021 


0.041 


GO:0006399 


BP 


tRNA metabolic process 


26 


10.3 


0.00040 


0.053 


GO:0071842 


BP 


Cellular component organization at cellular level 


190 


55.5 


0.0013 


0.065 


GO:0006418 


BP 


tRNA aminoacylation for protein translation 


19 


7.6 


0.0021 


0.084 


GO:0009725 


BP 


Response to hormone stimulus 


11 


4.9 


0.0022 


0.084 


GO:0005635 


CC 


Nuclear envelope 


13 


5.6 


0.0022 


0.084 


GO:0032507 


BP 


Maintenance of protein location in cell 


12 


5.3 


0.0023 


0.084 


GO:0051336 


BP 


Regulation of hydrolase activity 


20 


7.9 


0.0026 


0.089 


GO:0006629 


BP 


Lipid metabolic process 


51 


17.0 


0.0031 


0.095 


GO:0031072 


MF 


Heat shock protein binding 


17 


6.7 


0.0046 


0.11 


GO:0008152 


BP 


Metabolic process 


335 


91.3 


0.0070 


0.14 


GO:0004812 


MF 


Aminoacyl-tRNA ligase activity 


19 


7.2 


0.0075 


0.14 


GO:0016740 


MF 


Transferase activity 


211 


59.1 


0.0087 


0.14 


GO:0019899 


MF 


Enzyme binding 


19 


7.1 


0.0097 


0.14 


GO:0005216 


MF 


Ion channel activity 


14 


5.5 


0.011 


0.14 


GO:0022838 


MF 


Substrate-specific channel activity 


14 


5.5 


0.011 


0.14 


GO:0009308 


BP 


Amine metabolic process 


47 


15.2 


0.011 


0.14 


GO:0004222 


MF 


Metal loendopeptidase activity 


15 


5.8 


0.012 


0.14 


GO:0005938 


CC 


Cell cortex 


15 


5.8 


0.012 


0.14 


GO:0008237 


MF 


Metal lopeptidase activity 


23 


8.2 


0.012 


0.14 


GO:0007275 


BP 


Multicellular organismal development 


274 


74.9 


0.013 


0.14 


GO:0007165 


BP 


Signal transduction 


97 


28.8 


0.013 


0.14 


GO:0044106 


BP 


Cellular amine metabolic process 


40 


12.9 


0.019 


0.19 


GO:0044459 


CC 


Plasma membrane part 


45 


14.3 


0.021 


0.19 


GO:0006520 


BP 


Cellular amino acid metabolic process 


32 


10.6 


0.021 


0.19 


GO:0032879 


BP 


Regulation of localization 


36 


11.7 


0.022 


0.19 


GO:0006140 


BP 


Regulation of nucleotide metabolic process 


13 


5.0 


0.022 


0.19 


GO:0030811 


BP 


Regulation of nucleotide catabolic process 


13 


5.0 


0.022 


0.19 


GO:0033121 


BP 


Regulation of purine nucleotide catabolic process 


13 


5.0 


0.022 


0.19 


GO:0033124 


BP 


Regulation of GTP catabolic process 


13 


5.0 


0.022 


0.19 


GO:0043087 


BP 


Regulation of GTPase activity 


13 


5.0 


0.022 


0.19 


GO:0006793 


BP 


Phosphorus metabolic process 


97 


28.3 


0.023 


0.19 


GO:0006796 


BP 


Phosphate metabolic process 


97 


28.3 


0.023 


0.19 


GO:0042578 


MF 


Phosphoric ester hydrolase activity 


42 


13.4 


0.024 


0.19 


GO:0016758 


MF 


Transferase activity, transferring hexosyl groups 


26 


8.8 


0.024 


0.19 


Note. — Depletion results 


are shown 


in supplementary table S5, Supplementary Material online. 











Social Behaviors 

Two of the GO categories enriched in ants ("neuron recog- 
nition" and "adult locomotor/ behavior") might potentially 
be linked to the evolution of neural systems and behavior 
(table 3). The first category was "neuron recognition." 
However, GO categories related to neural systems were also 
enriched in a nonsocial hymenoptera lineage ("regulation of 
synaptogenesis/' "mushroom body development/' and 
"memory" on branch #8, basal to the Hymenoptera; fig. 1 
and supplementary table S9, Supplementary Material 
online), and in the branches leading to primitively social lin- 
eages of bees ("synapse," "synapse organization," "regulation 
of synaptic growth at neuromuscular junction"; data not 
shown) (also reported in Woodard et al. 2011), suggesting 



that positive selection on neural system genes in ants might 
not be directly associated with the emergence of social 
behaviors. 

The second GO category enriched in ants was "adult lo~ 
comotory behavior." This category was not enriched in any of 
the other tested lineages. The three genes contributing most 
to the positive selection signal in this GO category were DCX- 
EMAP, turtle, and beethoven. Mutational analyses of these 
genes in Drosophila suggest that they play an important 
role in sensory perception functions. Adult flies carrying a 
piggyBac insertion in DCX-EMAP are uncoordinated and 
deaf and display loss of mechanosensory transduction and 
amplification (Bechstedt et al. 2010). Turtle plays an essential 
role in the execution of coordinated motor output in 
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Fig. 3. Positive selection in the olfactory receptors gene family. Phylogenetic tree of manually annotated protein-coding sequences of olfactory receptors 
genes, including 291 genes from Pogonomyrmex barbatus in blue, 320 genes from Linepithema humile in green, and 262 genes from Nasonia vitripennis 
in black. The topology of the tree depicts the assemblage of 16 subtrees where positive selection was tested using the branch-site test of Codeml 
(Materials and Methods). Tested branches are depicted in gray if there was no evidence for positive selection and in red if there was significant evidence 
for positive selection at 10% FDR. Untested branches are depicted in black. Scale bar indicates the number of amino acid substitutions per site. 



complex behaviors in flies, notably regarding the response to 
tactile stimulation (Bodily et al. 2001). Finally, beethoven is 
involved in male courtship behavior, adult walking behavior, 
and sensory perception of sound in flies (Tauber and Eberl 
2001). This suggests that positive selection might have been 
important for the evolution of sensory perception functions 
in ants. 

A specific analysis of ORs did not provide support for the 
evolution of sociality being associated with increased levels of 
positive selection on ORs. A scan for positive selection across 
branches of a tree gathering 873 manually annotated ORs 
from two ants (Pogonomyrmex barbatus and Linepithema 
humile) and the solitary wasp Nasonia vitripennis (see 



Materials and Methods) revealed that positive selection was 
pervasive, with 277 branches (23%) displaying significant sig- 
nals for positive selection (fig. 3 and supplementary fig. S1, 
Supplementary Material online). However, positive selection 
was detected in only 19% of the 929 branches leading to ant 
species, whereas as many as 40% of the 156 branches leading 
to wasps were under positive selection (Fisher's exact test 
P = 7.6e-9). 

Phenotypic Plasticity among Castes 

None of the GO categories enriched for positively selected 

genes in ants could be linked to phenotypic plasticity (i.e., 
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caste differences). In particular, there was no evidence of a 
significant enrichment for GO categories related to morphol- 
ogy or morphogenesis in the ant lineage. Another enrichment 
test using annotations obtained from mutant phenotypes in 
D. melanogaster (which are more relevant than GO annota- 
tions to analyze genes involved in morphogenesis since genes 
sets mostly refer to anatomical structures) also provided no 
clear support for positive selection on genes associated with 
phenotypic plasticity in ants (supplementary table S12, 
Supplementary Material online). 

However, among the genes with the highest support for 
positive selection in ants (table 1), two genes had a role in 
wing development (Guanine nucleotide exchange factor in 
mesoderm and Methylthioribose~l -phosphate isomerase) and 
one in larval development (D/s3), suggesting that even if pos- 
itive selection did not act consistently on large sets of genes 
related to morphogenesis, it could have acted specifically on a 
few individual genes. 

Mitochondrial Genes 

Eleven GO categories enriched for positively selected genes in 
ants were related to mitochondrial activity (e.g., "mitochon- 
drial electron transport," "mitochondrial matrix," "mitochon- 
drial respiratory chain," "NADH dehydrogenase ubiquinone 
activity," and "oxidoreductase activity"; table 3). The mito- 
chondrial processes under positive selection were not re- 
stricted to respiration and energy production, but also 
included translation ("organellar ribosome," "mitochondrial 
small/large ribosomal subunit"). GO categories related to mi- 
tochondria were also enriched for positively selected genes on 
many individual branches of the ant lineage analyzed sepa- 
rately (supplementary table S9, Supplementary Material 
online), and in a larger data set including duplicated genes 
analyzed with the site test (see Materials and Methods; table 
1, supplementary tables S13 and S19, Supplementary Material 
online). This indicates that recurrent events of positive 
selection occurred on genes with mitochondrial functions 
during the evolution of the ant lineage. In contrast, 
mitochondria-related GO categories did not display any en- 
richment for positively selected genes in flies and bees 
(tables 4 and 5), despite a high power to detect it on the 
respective data sets (supplementary text, Supplementary 
Material online). Similarly, no mitochondrial function was 
significantly enriched in the branches #7 and #8, basal to 
the ant lineage (fig. 1 and supplementary table S9, 
Supplementary Material online), reinforcing the idea that in- 
creased positive selection on mitochondria is restricted to the 
ant lineage. 

Of note, none of the 13 protein coding genes (Oliveira et al. 
2008; Gotzek et al. 2010) from the mitochondrial genome was 
included in our main data set because the mitochondrial 
genomes of most of the ant species analyzed were not anno- 
tated. Our results thus reflect positive selection on nuclear 
genes encoding proteins that function in the mitochondrion. 
We annotated mitochondrial genomes in 5 of the 7 ant spe- 
cies analyzed and tested whether positive selection could also 
be detected on the mitochondrial genomes themselves 



(Materials and Methods) (Gerber et al. 2001; Bazin et al. 
2006; Meiklejohn et al. 2007). However, we did not find evi- 
dence for positive selection on these alignments, neither with 
the branch-site test (supplementary table S15, Supplementary 
Material online) nor with the site test (supplementary table 
S14, Supplementary Material online). 

Lifespan Genes 

There was a significant enrichment for positively selected 
genes in the ant orthologs of D. melanogaster genes that 
were downregulated in 61 -day-old flies compared with 10- 
day-old flies, based on a published microarray analysis 
(P = 0.011; below Bonferroni threshold a = 0.05/4 = 0.01 25; 
supplementary table S16, Supplementary Material online) 
(Landis et al. 2004). 

Two other genes known to be involved in aging were 
among the top-scoring genes for positive selection in our 
data set. The first was Tequila, which has been shown to be 
associated with aging in an experimental evolution study in D. 
melanogaster (Remolina et al. 2012). The other was 
mitochondrial Afunctional protein a subunit, whose knock- 
out also reduces lifespan in D. melanogaster (Kishita et al. 
2012). Although not in the list of top hits, Sod2 (superoxide 
dismutase [A/In], mitochondrial), a gene known to have anti- 
oxidant activity and whose overexpression has been shown to 
be associated with increased lifespan in some strains of D. 
melanogaster (Mockett et al. 1 999; Curtis et al. 2007), under- 
went positive selection at the base of the Hymenoptera lin- 
eage (FDR = 0.0073) and in the Acromyrmex echinatior 
branch (FDR = 9.6e-8). 

Selective Pressure on Genes with Caste-Biased 
Expression 

There was a marginally significant enrichment for positively 
selected genes among genes with biased expression in adult 
workers in S. invicta (effect size = 1.2, P = 0.025; not significant 
after Bonferroni correction a - 0.05/6 = 0.0083; supplemen- 
tary table S17, Supplementary Material online) and a stronger 
enrichment for genes with queen-biased expression in adults 
(effect size = 1.8, P = 0.0028). Surprisingly, however, there was 
a pattern of weaker enrichment for genes with male-biased 
expression in adults (effect size =1.04, P = 0.2381). At the 
pupal stage, we did not detect a significant enrichment for 
positively selected genes among any group of genes showing 
caste-biased expression. But similarly to the adult stage, 
the enrichment effect size was higher for genes with queen- 
biased expression (effect size =1.2) than for genes with 
worker-biased expression (effect size =1.1), and it was the 
lowest for genes showing male-biased expression (effect 
size =1.06). 

Discussion 

In this article, we report results from a genome-wide scan for 
positive selection in protein-coding sequences of seven ant 
genomes, using the rigorous branch-site model of Codeml 
(Zhang et al. 2005) with stringent data quality control. 
Positive selection was detected in the ant lineage for 20% of 
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the gene families analyzed. This proportion is similar in mag- 
nitude to the values observed in the other two insect lineages 
that we reanalyzed in this study: 20% in the 10 bee species and 
36% in the 12 flies species. 

Our analysis identified similarities in patterns of positive 
selection between the ants and other insect lineages. Notably, 
at the broadest phylogenetic scale that our data sets allowed 
us to study, functional categories related to proteolysis, me- 
tabolism, response to stimuli, and immunity, were enriched 
for positively selected genes in ants, bees, and flies. 
Interestingly, studies in mammals, fishes, and urchins also 
provided evidence for positive selection on similar functional 
categories (Kosiol et al. 2008; Studer et al. 2008; Oliver et al. 
2010; Montoya- Burgos 2011). Recurrent positive selection on 
such long evolutionary time scales is typical of genes involved 
in the interaction with changing environments or in conflict 
and competition, such as evolutionary arms races between 
sexes or between different species, which cause the perpetu- 
ation of adaptations and counter-adaptations in competing 
sets of coevolving genes (Dawkins and Krebs 1979). It is no- 
table that positive selection patterns on these categories of 
genes do not seem to reflect or be strongly affected by the 
large life-history differences between lineages analyzed here, 
for example the evolution of eusociality in the hymenopteran 
lineages. In particular, our results on immunity-related genes 
challenge the hypothesis that hygienic behaviors in social in- 
sects could have relaxed the selective pressure on immune 
genes, since this should be reflected in reduced levels of pos- 
itive selection on these genes (Honeybee Genome Sequencing 
Consortium 2006; Smith et al. 2008; Viljakainen et al. 2009; 
Werren et al. 2010; Fischman et al. 2011; Smith, Zimin, et al. 
2011; Smith, Smith, et al. 2011; Suen et al. 2011; Harpur and 
Zayed 2013). 

Our analysis indicated that genes involved in neurogenesis 
were under positive selection in ants and the primitively social 
lineages of bees. It was previously hypothesized that stronger 
selection on genes related to brain function and development 
should be observed in eusocial Hymenoptera species due to 
high cognitive demands associated with social life (Fischman 
et al. 2011). However, our results are not consistent with this 
prediction because we also uncovered signs of positive selec- 
tion at the base of the Hymenoptera lineage, i.e., before the 
evolution of sociality. Interestingly, a similar pattern had pre- 
viously been reported with brain morphological data. 
A comparative analysis of insects showed that the size of 
mushroom body started to increase at the base of the 
Euhymenopteran (Orussioidea + Apocrita) lineage, approxi- 
mately 90 My before the evolution of sociality in the Aculeata, 
and that there was no clear correlation between the size of 
brain components and the levels of sociality or cognition 
capabilities (Farris and Schulmeister 2011; Lihoreau et al. 
2012). To account for this observation, Fischman et al. 
(2011) tried to identify factors, other than sociality, that 
may have placed unique selective pressure on brain evolution 
in species of the Hymenoptera lineage. Based on the obser- 
vation that there was less positive selection on neurogenesis 
genes in highly social bees than in primitively social bees, they 
proposed that cognitive challenges might be associated with 



the mode of colony founding in social Hymenoptera. In par- 
ticular, primitively social bees, which transit from a solitary 
phase during the process of colony founding to a social phase, 
could experience higher cognitive needs than highly social 
bees, which never go through a solitary phase. However, 
our results are also inconsistent with this model since in- 
creased positive selection was observed before the evolution 
of sociality in Hymenoptera. A comprehensive survey of pos- 
itive selection on neurogenesis genes in Hymenoptera species, 
including species basal to the lineage, is required to identify 
precisely when the selective regime of these genes started to 
change, and in which hymenopteran sublineages it was 
maintained. 

Our results also challenge the hypothesis that genes in- 
volved in chemical signaling experienced increased positive 
selection in social insects (Ingram et al. 2005; Robertson and 
Wanner 2006; Bonasio et al. 2010; Smith, Zimin, et al. 2011; 
Wurm et al. 2011; Zhou et al. 2012; Leboeuf et al. 2013). The 
analysis of olfactory receptor repertoires in two ants and a 
nonsocial wasp indicates that positive selection on amino- 
acid substitutions was surprisingly less frequent in ant than in 
wasp branches. Given the limited number of species used in 
this analysis, future work should concentrate on generating 
extensive annotation of olfactory receptors from more 
Hymenoptera as well as outgroup species to identify charac- 
ters or traits that could be associated with the pattern of 
positive selection on olfactory receptors. 

Although our analyses did not provide support for previ- 
ous hypotheses about the expected effect of sociality on gene 
evolution, we identified several interesting functional catego- 
ries which were enriched for positively selected genes exclu- 
sively in the ant lineage, possibly reflecting ant-specific 
adaptations. The most consistent and robust result was 
that genes functioning in the mitochondria were particularly 
likely to be under positive selection. Mitochondrial activity 
plays an important role in the process of reproductive isola- 
tion and speciation (Lee et al. 2008; Burton and Barreto 2012), 
interactions with endosymbionts such as Wolbachia (Werren 
1997), diseases (Cortopassi 2002; Richly et al. 2003; Trifunovic 
et al. 2004, 2005), and aging (Lenaz 1998; Cortopassi 2002; 
Kowald and Kirkwood 201 1). In that respect it is notable that 
the evolution of sociality has been accompanied by a nearly 
100-fold increase in lifespan of queens compared with their 
solitary ancestors (Keller and Genoud 1997; Jemielity et al. 
2005). Three lines of evidence suggest that increased lifespan 
of queens might be related to increased positive selection on 
mitochondrial genes in the ant lineage. 

First, lifespan extension, not only in insects but also in 
other lineages such as birds and bats, appears to be associated 
with decreased production of reactive oxidative species (ROS) 
(Perez-Campo et al. 1998; Brunet-Rossinni 2004; Parker et al. 
2004; Corona et al. 2005; Jemielity et al. 2005). ROS are a 
normal by-product of cellular metabolism. In particular, one 
major contributor to oxidative damage is hydrogen peroxide 
(H 2 0 2 ), which is produced from leaks of the respiratory chain 
in the mitochondria (Harman 1972; Lenaz 1998; Finkel and 
Holbrook 2000; Cui et al. 2012). Positive selection in ants on 
genes functioning in the mitochondria may thus reflect 
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selection to increase mitochondrial efficiency and reduce ROS 
production. Interestingly, positive selection on genes with 
mitochondrial functions was previously documented in the 
bat lineage (Shen et al. 2010; Zhang et al. 2013), which include 
species with exceptional longevity (Brunet-Rossinni and 
Austad 2004). In the bat Myotis lucifugus, ROS production 
was also shown to be significantly lower than in two similar 
sized mammal species (a mouse and a shrew) although the 
metabolic rates, and thus mitochondrial activity, of the 
former were much higher because of flight demands 
(Brunet-Rossinni 2004). 

Second, on the basis of gene expression data obtained in 
the fire ant S. invicta, our analyses revealed that positive se- 
lection was strongest on genes with queen-biased expression, 
intermediate on genes with worker-biased expression, and 
weakest on genes with male-biased expression. This associa- 
tion between levels of positive selection and caste-biased dif- 
ferences in gene expression cannot be simply accounted by 
differences in expression levels of mitochondrial genes (which 
are enriched for positively selected genes in ants) since in S. 
invicta mitochondrial genes are significantly less expressed in 
queens than in workers at the larval stage, and not differen- 
tially expressed at the adult stage (supplementary fig. S2, 
Supplementary Material online). The finding of higher levels 
of positive selection for genes more highly expressed in the 
castes with the longer lifespan (queens can live decades in 
some species, whereas workers have lifespan in the order of 
months, and males in the order of days) suggests that in- 
creased positive selection on queen-specific genes could be 
related to longer lifespan. 

Third, our analyses showed that the levels of positive se- 
lection were higher on orthologs of genes which are down- 
regulated during aging in flies. These genes include numerous 
energy metabolism genes, and their downregulation in old 
flies is thought to reflect a decline of normal and functional 
mitochondria with age (Yui et al. 2003; Landis et al. 2004). The 
finding of increased levels of positive selection on genes 
whose expression declines at older ages suggests that the 
function of these genes might be improved in ants, potentially 
delaying the loss of normal activity in mitochondria with age. 
It would be interesting to test if parallel mechanisms also 
evolved in the ant lineage to maintain the expression of 
these genes and delay the decline of mitochondria activity 
through lifespan in queens. 

In contrast to ants, there was no evidence of elevated 
levels of positive selection on mitochondrial functions in 
bees. As most social species, bees also evolved longer 
queen lifespans (more than 2 years) compared with 
males and workers (a few weeks) (Keller and Genoud 
1997; Munch et al. 2008). There are four possible explana- 
tions for the difference between ants and bees in the level 
of positive selection on mitochondrial genes. First, lifespan 
differences between castes are less pronounced in bees, 
where queens live up to 2-5 years, than in ants, where 
queens can live up to 30 years, possibly resulting in lower 
selective pressure to increase lifespan in bees than in ants. 
Second, because eusociality evolved independently in ants 
and bees it is possible that extended queen lifespans 



evolved by different molecular mechanisms (Jemielity 
et al. 2005; Jobson et al. 2010). For example, vitellogenin 
may play a more central role for aging in bees than ants 
(Amdam and Omholt 2002; Corona et al. 2007; Munch 
et al. 2008). Third, the evolution of mitochondria-related 
genes may have been differently constrained in ants and 
bees. For example, metabolic rates differ greatly between 
flying bee workers and non-flying ant workers because 
flight is an energetically costly behavior requiring highly 
elevated metabolic rates (Jensen and Holm-Jensen 1980; 
Suarez 2000; Niven and Scharlemann 2005). Because me- 
tabolism and mitochondrial activity are closely connected, 
lower metabolic rates in ants might have alleviated func- 
tional constraints on mitochondria-related genes, allowing 
selection to act on lifespan extension. Fourth, the GC con- 
tent in bee genomes was shown to be lower than in ant 
genomes (Honeybee Genome Sequencing Consortium 
2006; Simola et al. 2013). Some parts of the bee genomes, 
in particular their mitochondrial genomes (Crozier and 
Crozier 1993; Gotzek et al. 2010; Tan et al. 2011), display 
extreme bias in nucleotide composition, which leads to 
significant effect on both the codon usage patterns and 
amino-acid composition of proteins and may have inter- 
fered with the action of positive selection. 

If positive selection acted to optimize the functioning of 
mitochondria in ants, it could be expected that the mito- 
chondrial genome itself should be targeted by positive 
selection. However, mitochondrial genes generally exhibit 
very low d N /d s ratios (Montooth and Rand 2008) and there 
was no clear evidence in our results for positive selection on 
the 13 genes of the mitochondrial genome itself. This suggests 
that innovations related to mitochondrial activity could 
arise more easily on nuclear genes, whereas mitochondrial 
genes seem more likely to maintain conserved core 
functionalities. 

In conclusion, this study provides a detailed analysis of the 
extent of positive selection events on protein-coding genes in 
seven ant species. Because false positives are a major concern 
for whole-genome scans for positive selection, we used a con- 
servative methodology. We also reanalyzed data in bees and 
flies with the same methods to permit an unbiased and 
robust comparison of positive selection between lineages. 
The comparison between these three lineages provided inter- 
esting perspectives on the evolution of genes implicated in 
immunity, neurogenesis, and olfaction, and allowed us to 
pinpoint positive selection events that were specific to the 
ant lineage. In particular, we found that the evolution of ex- 
treme lifespan in ants was associated with positive selection 
on genes with mitochondrial functions, suggesting that a 
more efficient functioning of mitochondrial genes might 
have been an important step toward the extreme lifespan 
extension that characterizes this lineage. It would be 
interesting to complement this study by scans for genes 
under lineage-specific strong or relaxed purifying selection, 
to get a more global picture of natural selection patterns in 
ant genomes, and uncover additional genes that could have 
played a significant role during the evolution of the ant 
lineage. 
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Materials and Methods 

Single-Copy Orthologs Gene Families Data Set 
Protein -coding gene sequences of the seven ant genomes 
were downloaded from the Hymenoptera Genome 
Database (http://hymenopteragenome.org/ant_genomesA 
last accessed April 24, 2014) (Munoz-Torres et al. 2011). 

The complete annotated gene sets were OGS_1.0 for 
Acromyrmex echinatior (Nygaard et al. 2011), OGS_1.2 for 
Atta cephalotes (Suen et al. 2011), OGS_2.2.3 for Solenopsis 
invicta (Wurm et al. 2011), OGS_1.2 for Pogonomyrmex bar- 
batus (Smith, Smith, et al. 2011), OGS_3.3 for Camponotus 
floridanus (Bonasio et al. 2010), OSG_1.2 for Linepithema 
humile (Smith, Zimin, et al. 2011), and OGS_3.3 for 
Harpegnathos saltator (Bonasio et al. 2010). Coding sequences 
of five outgroup species were downloaded from the 
Hymenoptera Genome Database for the honey bee (Apis 
mellifera Amel_pre_release2) (Honeybee Genome 
Sequencing Consortium 2006) and the jewel wasp (Nasonia 
Vitripenis OGS_v1.2) (Werren et al. 2010), from Flybase 
(Tweedie et al. 2009) for the fruit fly (Drosophila melanogaster 
FB5.29) (Adams et al. 2000), from BeetleBase (Kim, Murphy, 
et al. 2010) for the flour beetle (Tribolium castaneum 
Tcas_3.0) (Tribolium Genome Sequencing Consortium 
2008), and from vectorBase (Lawson et al. 2009) for the 
body louse (Pediculus humanus PhumU1.2) (Kirkness et al. 
2010). 

Gene families were obtained from a custom run of the 
OrthoDB pipeline for the Ant Genomic Consortium 
(http://cegg.unige.ch/orthodbants and http://bioinfo.unil.ch/ 
supdata/Roux_positive_selection_ants/orthoDB_run.zip, last 
accessed April 24, 2014; pipeline of OrthoDB release 4) 
(Waterhouse, Zdobnov, Tegenfeldt, et al. 2011; Simola et al. 
2013). Briefly, OrthoDB implements a Best Reciprocal Hit 
clustering algorithm based on all-against-all Smith- 
Waterman protein sequence comparisons. The longest alter- 
natively spliced form of genes is used. The orthologous groups 
are built at different taxonomic levels and it is possible to 
query for specific phyletic profiles by combining the criteria of 
absent, present, single-copy, multicopy, or no restriction, for 
each species within the studied clade. 

Gene families including strictly one ortholog in each of the 
12 species were selected (2,756 gene families). Because anno- 
tations of the studied genomes are likely to be incomplete 
(Simola et al. 2013), families with a few missing genes — gene 
losses or unannotated genes — were included, with the restric- 
tion that at least four genes out of the seven ant species, and 
three genes out of the five outgroup species should be present 
in the gene family. Simola et al. (2013) have shown that 
among the seven ant species, there were generally few lost 
or missing genes, apart from S. invicta (less than 400 S. invicta 
genes were missing in single-copy orthologs families) and Ac. 
echinatior (<150 Ac. echinatior genes were missing in single- 
copy orthologs families). Our gene family selection criteria 
allow handling such a moderate amount of missing genes 
in families. In order to transfer functional annotations from 
D. melanogaster, only families including a fruit fly ortholog 
were retained. With these criteria, the number of OrthoDB 



groups in the data set increased to 4,337. All gene families 
were assumed to follow the species tree topology (fig. 1). 
The exclusion of families that experienced gene duplication 
facilitates the comparison of branches between gene families, 
and keeps our analysis from biases related to differential 
duplication among lineages (Waterhouse, Zdobnov, 
Kriventseva 2011) and among genes (Davis and Petrov 
2004; He and Zhang 2006), and to the consequences of 
duplication (Force et al. 1999; Brunet et al. 2006). Finally, 
results on single-copy orthologs can be easily compared 
with previously published studies using similar gene 
family topologies (Drosophila 12 Genomes Consortium 
2007; Kosiol et al. 2008; Studer et al. 2008; Lindblad-Toh 
et al. 2011). 

Basic sequence quality features were first controlled as in 
Hambuch and Parsch (2005). CDS (coding sequences) whose 
length was not a multiple of 3 or did not correspond to the 
length of the predicted protein, or that contained an internal 
stop codon, were eliminated; the longest CDS of genes show- 
ing multiple isoforms was retained; CDS shorter than 100 nt 
were eliminated. 

Because misalignment errors can be an important source 
of false positives in genome-wide scans for positive selection 
in coding sequences (Schneider et al. 2009; Markova-Raina 
and Petrov 2011; Yang and dos Reis 2011; Jordan and 
Goldman 2012), we took great care at filtering the potentially 
problematic sites in the alignments. The quality filtering pipe- 
line used here is adapted from the pipeline of the Selectome 
database release 4 (http://selectome.unil.ch, last accessed 
April 24, 2014) (Proux et al. 2009; Moretti et al. 2014). The 
multiple alignment of the protein sequences in each gene 
family was computed by M-Coffee (Wallace et al. 2006) 
from the T-Coffee package v8.93 (Notredame et al. 2000), 
which combines the output of different aligners. Similarly to 
Ensembl Compara (see http://www.ensembl.org/info/docs/ 
compara/homology_method.html [last accessed April 24, 
2014] for more details) (Vilella et al. 2009), four different 
aligners were used for M-Coffee (mafftgins_msa, muscle_msa, 
kalign_msa, and t_coffee_msa). M-Coffee outputs a consen- 
sus of four alignments from the different aligners, and a qual- 
ity score for each residue based on the concordance of the 
alignment at each position by different aligners. Scores lie 
between 0, if a residue was not aligned at the same position 
by the different aligners, and 9 if it is reliably aligned at the 
same position in all cases. Reliably aligned residues with a 
score of 7 or above were retained. We used the heuristic 
algorithm of MaxAlign v1.1 (Gouveia-Oliveira et al. 2007) to 
detect and remove sequences badly aligned as a whole (gap- 
rich sequences) in the multiple sequence alignments. When a 
sequence was removed, the gene family was realigned and 
refiltered using M-Coffee. Families left with less than four 
sequences were discarded because of insufficient power to 
detect positive selection. The protein alignments were re- 
verse-translated to nucleotide alignments using the seq_re- 
format utility of the T-Coffee package (Notredame et al. 
2000). 

We used a stringent Gblocks filtering (v0.91b; type = co- 
dons; minimum length of a block = 4; no gaps allowed) 
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(Talavera and Castresana 2007) to remove gap-rich regions 
from the alignments, as these are problematic for positive 
selection inference (Fletcher and Yang 2010; Markova-Raina 
and Petrov 2011). The large memory requirements of 
M-Coffee for long alignments led us to use only Gblocks 
without M-Coffee scoring if the length of the alignment 
was greater than 9,000 nt. 

After filtering, our data set included 4,261 gene families 
with an average of 10.4 branches per family to test (fig. 1; 
44,306 branches to test; median = 10 branches per family). 
The mean length of filtered alignment was 1,133nt (me- 
dian = 885 nt), ranging from a minimum of 54 nt to a max- 
imum of 22,248 nt. Of note, lost or missing genes in families 
affect the topology of the trees and the possibility to com- 
pare equivalent branches of different families. In total, our 
data set contains 36,681 branches (83%) in 4,256 families 
which corresponded to the canonical topology defined by 
the species tree (fig. 1) and could be compared across fam- 
ilies (e.g., table 2). 

Our analyses are likely to underestimate the genome-wide 
number of positive selection events because 1) single-copy 
orthologs tend to evolve under stronger purifying selection 
than multicopy gene families (Waterhouse, Zdobnov, 
Kriventseva 2011), 2) the ant genomes still lack good anno- 
tation of gene models and single-copy orthologs gene families 
could be missed, and 3) we filtered out unreliable parts of 
sequence alignments including fast evolving residues that are 
difficult to align (Fletcher and Yang 2010; Privman et al. 2012). 
The last point is balanced by the fact that conserved regions 
might be more prone to positively selected substitutions 
(Bazykin and Kondrashov 2012) and that the removal of 
unreliable regions seems to increase the power to detect pos- 
itive selection (Jordan and Goldman 2012; Privman et al. 
2012). 

Extensive Gene Families Data Set 
Another data set gathered all gene families from the OrthoDB 
database that could pass our quality filters, and notably fam- 
ilies that experienced duplications. The CDS were filtered as 
described earlier. Amino-acid sequences were aligned using 
PAGAN version 0.47 (Loytynoja et al. 2012). The program 
GUIDANCE (v1.1) was used to assess alignment confidence 
and mask unreliably aligned residues (Penn et al. 2010; 
Privman et al. 2012). The combination of a phylogeny- 
aware aligner (PAGAN replaces PRANK [Loytynoja and 
Goldman 2008] and is based on the same principle) and of 
this filtering algorithm was shown to perform the best in 
recent benchmark studies on simulated data (Jordan and 
Goldman 2012; Privman et al. 2012). Gene family phylogenies 
were built using RAxML (v7.2.9) (Stamatakis 2006) from the 
amino-acid sequences, with the LG matrix and the CAT 
model. Amino-acid alignments were reverse-translated into 
the corresponding codon alignments. This resulted in 6,186 
families tested, with an average of 11 genes, and an average 
length of filtered alignment of 3,129 nt (median of 2,385 nt, 
ranging from a minimum of 192 nt to a maximum of 
20,556 nt). 



Mitochondrial Gene Families Data Set 
Contigs corresponding to mitochondrial genomes could be 
downloaded for five of the seven ant genomes (Ac. echinatior, 
At cephalotes, S. invicta, P. barbatus, and L humile). They were 
submitted to MITOS, a web server for the annotation of 
metazoan mitochondrial genomes (http://mitos.bioinf.uni- 
leipzig.de/index.py, last accessed April 24, 2014) (Bernt et al. 
2012). This gave us the predicted coordinates of 13 mitochon- 
drial protein-coding genes in these species. Frameshift errors 
or incomplete gene predictions were manually corrected. 
Mitochondrial genes from the outgroup species Ap. mellifera, 
N. Vitripenis, and T. castaneum were downloaded from 
GenBank (accession numbers: L06178; EU746609.1, and 
EU746613.1; AJ312413.2 and NC_003081.2, respectively). 
Mitochondrial genes from D. melanogaster were downloaded 
from Flybase at ftp://ftp.flybase.net/genomes/Drosophila_ 
melanogaster/dme l_r5.43_FB201 2_01 /fasta/dmel-dme l_ 
mitochondrion_genome-CDS-r5.43.fasta.gz (last accessed 
April 24, 2014). The alignment and filtering steps for the 13 
mitochondrial gene families were identical to the data set of 
single-copy orthologs nuclear gene families (see above). A 
total of 119 branches were tested in this data set (average 
of 9.2 and median of 9 branches per family; average length of 
filtered alignment of 641 nt and median of =621 nt, ranging 
from a minimum of 39 nt to a maximum of 1,413 nt). 

Twelve Drosophila Data Set 

Single-copy ortholog gene families from the twelve 
sequenced Drosophila species were downloaded from ftp:// 
ftp.flybase.net/12_species_analysis/clark_eisen/alignments/ 
(last accessed April 24, 2014) (files "all_species.guide_tree.lon- 
gest.cds.tar.gz" and "al l_species.guide_tree.longest.transla- 
tion.tar.gz") (Drosophila 12 Genomes Consortium 2007). 
The alignment and filtering steps for these gene families 
were identical to the data set of single-copy ortholog gene 
families used for the ant analysis. Out of 6,698 initially down- 
loaded Drosophila gene families, 3,749 (56%) passed our filters 
and could be tested for positive selection, resulting in 77,495 
branches tested (average of 20.7 and median of 21 branches 
per family; average length of filtered alignment of 876 nt and 
median of 708 nt, ranging from a minimum of 15 nt to a 
maximum of 14,535 nt). 

Bee Data Set 

Single-copy ortholog gene families from 10 bee species were 
downloaded from http://insectsociogenomics.illinois.edu/ 
(last accessed April 24, 2014). This set of gene families is in- 
complete as it is derived from the sequencing of expressed 
sequence tags (using 454 Life Science/Roche GS-FLX plat- 
form) from nine bee species (Woodard et al. 2011), and 
from gene models of the honey bee Ap. mellifera 
(Honeybee Genome Sequencing Consortium 2006). The 
alignment and filtering steps for these gene families were 
identical to the data set of single-copy ortholog gene families 
used for the ant analysis. Out of 3,647 initially downloaded 
gene families, 2,256 (62%) passed our filters and could be 
tested for positive selection, resulting in 20,169 branches 
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tested (average of 8.9 and median of 9 branches per family; 
average length of filtered alignment of 611 nt and median of 
528 nt, ranging from a minimum of 27 nt to a maximum of 
3,945 nt). 

Branch-Site Test for Positive Selection 
We used the updated branch-site test (Zhang et al. 2005) of 
Codeml from the package PAML v4.4c (Yang 2007) to detect 
Darwinian positive selection experienced by a gene family in a 
subset of sites in a specific branch of its phylogenetic tree. This 
test was previously used in genome-wide scans for positive 
selection in various lineages (Bakewell et al. 2007; Kosiol et al. 
2008; Studer et al. 2008; Vamathevan et al. 2008; Oliver et al. 
2010; George et al. 201 1) and is used by the Selectome project 
(http://selectome.unil.ch, last accessed April 24, 2014) (Proux 
et al. 2009; Moretti et al. 2014). It is acknowledged to be more 
sensitive for the detection of positive selection than branch 
tests (Yang 1998) or site tests (Yang et al. 2000), because it 
does not average the signal over all codons in the alignment 
(branch test) nor over all branches of the phylogeny (site test) 
(Yang and dos Reis 2011). It is also robust to relaxation of 
purifying selection (co close to 1) since this scenario is ac- 
counted for in the null model (Zhang 2004; Zhang et al. 
2005). The alternative model is contrasted to the null 
model using a likelihood-ratio test (LRT), where log-likelihood 
ratios are compared to a chi-square distribution with 1 degree 
of freedom (Zhang et al. 2005). Previous studies have reported 
the branch-site test to be conservative in this setup (Bakewell 
et al. 2007; Studer et al. 2008; Fletcher and Yang 2010; Yang 
and dos Reis 2011; Gharib and Robinson-Rechavi 2013). We 
did not use the co estimates to infer the strength of positive 
selection because they were shown to be unreliable (Bakewell 
et al. 2007; Yang and dos Reis 2011). 

In the absence of a specific a priori hypothesis regarding 
which branches to test for positive selection, our implemen- 
tation runs the test multiple times on each gene family, suc- 
cessively changing the branch selected as foreground. The 
branches considered as foreground are highlighted in red in 
figure 1. This approach was shown to be legitimate if P-values 
from the successive tests are corrected for multiple testing 
(Anisimova and Yang 2007; Yang and dos Reis 2011). We 
applied a FDR correction (Benjamini and Hochberg 1995) 
over all the P-values treated as one series (number of branches 
tested x number of gene families tested). In the ant single- 
copy orthologs nuclear data set, we analyzed a maximum of 
15 branches leading to the 7 ant species, summing to 44,306 
tests performed. In the ant mitochondrial data set, we ana- 
lyzed a maximum of 11 branches leading to 5 ant species, 
summing to 119 tests (branches in red in supplementary fig. 
S3, Supplementary Material online). In the Drosophila single- 
copy orthologs data set, we analyzed a maximum of 21 
branches, leading to a total of 77,495 tests (supplementary 
fig. S4, Supplementary Material online). Finally in the bee data 
set, we analyzed a maximum of 17 branches, leading to a total 
of 20,169 tests (supplementary fig. S5, Supplementary 
Material online). 



All computations were performed using Slimcodeml (re- 
lease May 4, 2011) (Schabauer et al. 2012), an optimized ver- 
sion of Codeml, based on the release 4.4c of the PAML 
package (downloadable at http://selectome.unil.ch/cgi-bin/ 
downloadcgi, last accessed April 24, 2014). Slimcodeml was 
estimated to run the branch-site models about 1.77 times 
faster than the original Codeml thanks to the use of external 
standard libraries for linear algebra calculations and specific 
optimizations for the computer architecture used. We veri- 
fied on a subset of the gene families that the results given by 
Slimcodeml were identical with the original Codeml. 
Examples of Slimcodeml/Codeml control files used are pro- 
vided in supplementary text, Supplementary Material online. 
For the ant mitochondrial data set, Codeml was used with the 
option "icode = 4" to use the Invertebrate mitochondrial ge- 
netic code (http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/ 
wprintgc.cgi#SG5, last accessed April 24, 2014). 

The branch-site model is known to display convergence 
problems in the calculation of likelihoods (Yang and dos Reis 
2011), leading to negative or artificially large log-likelihood 
ratios. We thus launched three independent runs for both 
the alternative and null hypotheses, for each branch of each 
gene family, and kept the best likelihood value of each run to 
calculate the log-likelihood ratio (Yang and dos Reis 201 1). Of 
note, the likelihood differences observed across the three runs 
were most of the time very small. Even after reconciliation of 
three replicate runs, we still observed a number of negative 
log-likelihood ratios (8% of the tests — most of them very 
close to 0). In such cases, we manually set the log-likelihood 
ratios to 0 (meaning nonsignifkance). We recorded the larg- 
est differences in likelihood values between the three inde- 
pendent runs in both fixed and alternative models (d). The 
distribution of differences was bimodal, with a first major 
mode at d = 0, gathering most data, and a second minor 
mode at d~1. A cutoff at d = 0.004 clearly separated the 
two peaks. We used this stringent cutoff (d > 0.004) to elim- 
inate all tests with potential convergence problems in the 
fixed and alternative models (see supplementary text and 
table S23, Supplementary Material online). 

Values of d N and d s were calculated with parameters ex- 
tracted from Codeml results files (.mlc files). 

All calculations were performed on the SIB Vital-IT cluster 
in Lausanne (http://www.vital-it.ch/, last accessed April 24, 
2014). All three runs and the two hypotheses of each test 
were performed on the same node of the cluster. 

Site Test for Positive Selection 

The site test (Yang et al. 2000) of Codeml from the package 
PAML v4.4e (Yang 2007), allowing the d N /d s ratio (co) to vary 
among sites, was run on the extensive data set of 6,186 fam- 
ilies (see above). We contrasted the null model M8a (beta and 
co with co = 1) to the alternative model M8 (beta and co with 
co > 1) with 11 site classes (Swanson et al. 2003; Wong et al. 
2004). Examples of Codeml control files used are provided in 
supplementary text, Supplementary Material online. Similar 
to the branch-site test, we launched three independent runs 
for both the alternative and null hypotheses for each gene 
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family and kept the best likelihood value of each run for the 
LRT (supplementary table S19, Supplementary Material 
online). The likelihood ratios were compared to a chi- 
square distribution with 1 degree of freedom as recom- 
mended in PAML user's guide (http://abacus.gene.ucl.ac.uk/ 
software/pamlDOC.pdf, last accessed April 24, 2014). 

Reconstruction of Ancestral G + C Content 
The program nhPhyml (Galtier and Gouy 1998; Guindon and 
Gascuel 2003; Boussau and Gouy 2006) was used to estimate 
the G + C content at third codon positions at each node of 
the gene family trees (topology fixed, transition/transversion 
ratio estimated, alpha parameter estimated with eight cate- 
gories). Following Studer et al. (2008), we calculated the shift 
in GC3 content at each branch as the difference between GC3 
contents at the nodes delimitating that branch. 

Olfactory Receptors Family 

Olfactory receptors are difficult to process in automated pipe- 
lines since they are characterized by dynamic patterns of du- 
plications and pseudogenization during evolution (Nozawa 
and Nei 2007). Furthermore, the sequences of ORs are highly 
variable and notoriously difficult for automatic gene annota- 
tion. Accordingly, our main data set of single-copy orthologs 
was depleted in genes involved in olfaction (supplementary 
tables S7, S8, S10, and S11, Supplementary Material online) 
and GO categories related to olfaction could not be tested for 
enrichment of positively selected genes because they included 
too few annotated genes. We therefore used a more compre- 
hensive data set of 873 manually annotated protein-coding 
sequences of OR genes (excluding suspected pseudogenes) 
provided by Hugh Robertson for P. barbatus (291 genes) 
(Smith, Smith, et al. 2011), L humile (320 genes) (Smith, 
Zimin, et al. 2011), and N. vitr'ipenriis (262 genes) (Werren 
et al. 2010). Nucleotide sequences were translated and 
amino-acid sequences were aligned using MAFFT (Katoh 
et al. 2005). Unreliably aligned residues were masked using 
GUIDANCE based on 32 bootstrap samples and a cutoff of 0.2 
that was chosen so that the 15% lowest scoring residues are 
masked (Penn et al. 2010; Privman et al. 2012). Phylogeny was 
reconstructed using RAxML with the JTT substitution matrix, 
the CAT approximation, and 100 bootstrap samples 
(Stamatakis 2006). Because the resulting gene tree was too 
large for an analysis with the branch-site test of Codeml, we 
divided it into 16 smaller subtrees, each containing less than 
100 leaves. Branches with as high as possible bootstrap sup- 
port were chosen as splitting points. The 16 subtrees include 
all ant sequences but only 105 N. vitr'ipenriis sequences. The 
sequences from each subtree were realigned using PRANK 
version 100701 (Loytynoja and Goldman 2008) and reverse- 
translated into corresponding codon alignments. GUIDANCE 
was used to mask unreliably aligned codons (0.8 cutoff). 
Phylogeny was reconstructed using RAxML as above. Out 
of 1,744 branches in the initial tree, 1,400 branches from 
the subtrees were tested using the branch-site test of 
Codeml (see above), and the computation was successful 
(both null and alternative hypotheses) for 1,184 branches. 



Significant branches are highlighted in red in figure 3 and in 
supplementary figure S1, Supplementary Material online. Full 
results of the branch-site test on all 16 clades are shown in 
supplementary table S18, Supplementary Material online. 
A full tree with branch names and bootstrap values is pro- 
vided as supplementary figure S1, Supplementary Material 
online. Newick trees of the 16 individual subtrees along 
with annotation of tested branches are available in supple- 
mentary text, Supplementary Material online. 

Tests of Functional Category Enrichment 
GO (Ashburner et al. 2000) annotations for gene families were 
taken from the annotation of the D. melanogaster gene 
member they include (downloaded from http://flybase.org/ 
static_pages/down loads/FB201 1 _02/go/gene_association.fb. 
gz, last accessed April 24, 2014). The annotation of children 
GO categories was propagated to their parent categories fol- 
lowing the GO graph structure. GO categories mapped to 10 
genes or less were discarded for the enrichment analysis. 

To identify over- and underrepresented functional catego- 
ries present in the data sets used in this study, the package 
topGO version 2.4 (Alexa et al. 2006) of Bioconductor 
(Gentleman et al. 2004) was used. A Fisher's exact test was 
used, combined with the "elim" algorithm of topGO, which 
decorrelates the graph structure of the GO to reduce nonin- 
dependence problems (Alexa et al. 2006). The reference set 
was constituted of all OrthoDB families including a D. mela- 
nogaster gene with GO annotation. GO categories with an 
FDR < 20% are reported (Benjamini and Hochberg 1995). 

Regarding the functional enrichment of genes targeted by 
positive selection, the Fisher's exact test approach has been 
criticized because it imposes the choice of an arbitrary cutoff 
to dichotomize genes into "significant" and "nonsignificant" 
categories. This leads to a loss of information and limits the 
power and robustness of this method (Allison et al. 2006; 
Tintle et al. 2009; Daub et al. 2013). To test for GO functional 
categories for enrichment for positively selected genes, we 
instead used a gene set enrichment approach, which tests 
whether the distribution of scores of genes from a gene set 
differs from the whole data set scores distribution, allowing 
the detection of gene sets that contain many marginally sig- 
nificant genes. Different implementations for this approach 
have been proposed. The most widely used is the gene set 
enrichment analysis (GSEA) (Subramanian et al. 2005), but it 
was shown to perform relatively poorly (Kim and Volsky 2005; 
Efron and Tibshirani 2007; Tintle et al. 2009). Here, we used a 
SUMSTAT test: for a given gene set g including n genes, the 
SUMSTAT statistic is defined as the sum of scores of the n 
genes. This statistic was shown to be more sensitive than a 
panel of other methods, while controlling well for the rate of 
false positives (Ackermann and Strimmer 2009; Tintle et al. 
2009; Fehringer et al. 2012; Daub et al. 2013). To be able to use 
the distribution of log-likelihood ratios of the positive selec- 
tion test — which follows a chi-square distribution with 1 
degree of freedom and spans several orders of magnitude — 
as scores in the SUMSTAT test, we applied a fourth root 
transformation as variance stabilizing method. This 
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transformation conserves the ranks of gene families (see 
http://udel.edu/~mcdonald/stattransform.html, last accessed 
April 24, 2014) (Canal 2005; McDonald 2009). According to 
the Central Limit Theorem, the distribution of SUMSTAT 
scores from random gene sets approaches a normal distribu- 
tion whose mean and variance derives from the mean and 
variance of the scores of the complete list of tested genes C: 

E(SUMSTAT) = n • E(G) 

and 

Var(SUMSTAT) = n • Var(C). 

We performed bidirectional tests against this distribution to 
test whether the SUMSTAT statistic for a given gene set is 
higher or lower than expected by chance, corresponding to 
respectively enrichment or depletion for positively selected 
genes in this gene set. We verified the accuracy of this meth- 
odology by drawing an empirical null distribution for each 
gene set of size n found in the real data set, based on scores of 
10,000 gene sets of same size n randomly picked from the 
whole data set. The distribution of SUMSTAT scores of these 
randomized gene sets approximates closely a normal distri- 
bution, even when the set size is small (supplementary fig. S6, 
Supplementary Material online). This makes the SUMSTAT 
test less computationally intensive than other gene set en- 
richment approaches (e.g., GSEA) (Subramanian et al. 2005) 
where the null distribution cannot be inferred mathematically 
and randomizations have to be performed for each individual 
test. We verified that a GSEA approach gave broadly similar 
results (not shown). 

Because different gene sets sometimes share many genes in 
common, the list of significant gene sets resulting from en- 
richment tests is usually highly redundant. We implemented 
the "elim" algorithm from the Bioconductor package topGO, 
to decorrelate the graph structure of the GO (Alexa et al. 
2006). Briefly, the GO categories are tested recursively starting 
from the deeper levels of the GO tree, and the genes anno- 
tated to these significant categories are removed from all their 
parent categories. As the tests for different categories are not 
independent, it is not clear whether classical approaches to 
assess the FDR (e.g., Benjamini and Hochberg 1995) are accu- 
rate. Thus, we calculated empirically an FDR at each P-value 
threshold by performing 100 randomizations where the 
scores of gene families were permuted and the gene set en- 
richment test rerun. The FDR is estimated as 

FP N 0 

FDR = = — , 

FP + TP N t 

where at a given P-value threshold N 0 represents the mean 
number of false positives obtained in the randomizations and 
N t represents the number of positives obtained with the real 
data set. The FDR obtained with this approach was in good 
agreement with the Benjamini-Hochberg FDR (Benjamini 
and Hochberg 1995). GO categories with an FDR < 20% are 
reported. Functional categories depleted in positive selection 
reflect the most conserved sets of functional categories, under 



the action of purifying selection. These are not discussed in 
this article. 

The gene set enrichment test ran on each individual 
branch of the tree with results of the branch-site test yields 
heterogeneous results, probably resulting from differences in 
power of the branch-site test on different branches of the 
phylogeny (supplementary table S9, Supplementary Material 
online; only branches Sinv, Pbar, Hsal, #3 and #6 show some 
significant categories at FDR 20%). This test could also be 
sensitive to false positive results of the branch-site test (e.g., 
GC-biased gene conversion, discussed in supplementary text, 
Supplementary Material online). Thus, we designed a test less 
sensitive to these problems. We considered a unique score 
per gene family reflecting the evidence for positive selection 
globally in the ant lineage, the mean of the branch-site test 
scores on the 13 individual ant branches. This scoring scheme 
should unveil functional categories of genes that experienced 
extensive and probably recurrent episodes of positive selec- 
tion in the ant lineage, but is not strictly equivalent to using 
the results of a site test on ants branches, since it allows the 
detection of gene families with positive selection events af- 
fecting different sites on different branches. We also checked 
that in most cases, the enriched categories were not signifi- 
cant only because of a single outlier gene with a strong pos- 
itive selection score, but displayed a significant shift in the 
distribution of positive selection scores of numerous genes 
(supplementary fig. S7, Supplementary Material online). 

Finally, as a sanity check, the gene set enrichment test was 
also performed using KEGG pathways annotation. KEGG 
pathways and the mapping to D. melanogaster genes were 
downloaded with the KEGG REST API (http://www.kegg.jp/ 
kegg/rest/keggapi.html, last accessed April 24, 2014). Because 
hierarchical relationships among KEGG pathways are limited, 
we did not use the "elim" decorrelation algorithm. Pathways 
mapped to more than 10 genes were retained. In total, 51 
KEGG pathways were tested. 

Tests of Phenotypic Category Enrichment 
Mutant phenotype annotations of D. melanogaster genes 
were extracted from Fly base (Drysdale 2001; Osumi- 
Sutherland et al. 2013). The following ontologies were down- 
loaded from the OBO foundry (Smith et al. 2007): the Flybase 
controlled vocabulary ontology (http://www.berkeleybop. 
org/ontologies/obo-all/flybase_vocab/flybase_vocab.obo, 
last accessed April 24, 2014), the Drosophila anatomical on- 
tology (http://www.berkeleybop.org/ontologies/obo-all/fly_ 
anatomy/fly_anatomy.obo, last accessed April 24, 2014), and 
the Drosophila developmental stages ontology (http:// 
www.berkeleybop.org/ontologies/obo-all/fly_development/ 
fly_development.obo, last accessed April 24, 2014). The 
relationships between genes and alleles, and between al- 
leles and phenotypes (anatomical and developmental on- 
tology categories) were extracted from Flybase (ftp://ftp. 
flybase.net/releases/FB2012_01/reporting-xml/FBgn.xml. 
gz, last accessed April 24, 2014; "derived_pheno_class" and 
"derived_pheno_manifest" entities). The information on 
gain or loss-of-function alleles was extracted from the file 
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ftp://ftp.flybase.net/releases/FB2012_01/reporting-xml/ 
FBal.xml.gz (last accessed April 24, 2014) (loss of function: 
controlled vocabulary term FBcv:0000287 and child terms; 
gain of function: FBcv:0000290 and child terms). The an- 
notation of child phenotypic categories (anatomy of devel- 
opment) was propagated to their parent categories 
following the respective ontologies structures. 

To perform an enrichment analysis based on mutant phe- 
notypes in fruit fly, we used the SUMSTAT test. Because the 
annotation is scarcer than the CO annotation, we used only 
the categories mapped to more than five genes for the en- 
richment analysis. The reported results include the annota- 
tion for gain and loss-of-function alleles. We observed very 
similar results when gain-of-function alleles were removed 
from the annotation (Weng and Liao 2011) (not shown). 

Expression Data 

Microarray expression data from S. mv'icta (Ometto et al. 
2011) were provided by the authors upon request. These 
included expression levels of clones of the spotted microarray 
used, as well as the list of genes identified to be overexpressed 
in each of the three castes (workers, queens, and males), both 
at pupal and adult stages. The mapping of clones to the gene 
model of S. mv'icta (OGS_2.2.3) (Wurm et al. 2011) was pro- 
vided by Y. Wurm, and is similar to the mapping used in Hunt 
et al. (2011). If multiple clones mapped to the same gene, the 
average signal was used for expression. For differential expres- 
sion, we used the results of the original study (BAGEL analysis, 
where a clone was considered to be differentially expressed 
between conditions if the Bayesian posterior probability was 
P< 0.001, corresponding to an FDR~5%) (Ometto et al. 
2011). A gene was considered differentially expressed if at 
least one clone mapped to it was found differentially ex- 
pressed. Expression data were available for 1,327 genes of 
the single-copy orthologs data set, including 603 genes 
overexpressed in at least one condition. We ran a 
SUMSTAT gene set enrichment test on the sets of genes 
with caste-specific expression (pupal male, pupal queen, 
pupal worker, adult male, adult queen, and adult worker). 
P-values were obtained by comparison to an empirical distri- 
bution created with 10,000 randomizations of gene scores. 

Aging Genes 

Aging and oxidative stress associated genes were obtained 
from a microarray study in D. melanogaster comparing the 
expression of genes in 10-day-old flies to 61 -day-old flies, and 
flies exposed to 100% 0 2 for 7 days to controls (Landis et al. 
2004). We tested the enrichment for positively selected genes 
(SUMSTAT test) in four gene sets constituted of up and 
downregulated genes in both contrasts. P-values were ob- 
tained by comparison to an empirical distribution created 
with 10,000 randomizations of gene scores. 

Genes with Mitochondrial Function 
Genes with mitochondrial function were identified as those 
mapped to any of the 310 GO categories including "mito- 
chondria*" in their names or synonym names (using the 



search engine on http://amigo.geneontology.org/, last 
accessed April 24, 2014). Three hundred and thirteen of the 
identified genes had available microarray expression data in 
S. invicta. 

Data Availability 

Raw and filtered alignments used in these analyses track files 
for the alignment editor Jalview (Clamp et al. 2004), Codeml 
control files and result files can be downloaded at http:// 
bioinfo.unil.ch/supdata/Roux_positive_selection_ants/Roux_ 
et_al_datasets.tar.gz (last accessed April 24, 2014). 

A simple web interface displaying gene families, GO map- 
ping, Codeml results, and alignments (through a Jalview 
applet) is available at http://bioinfo.unil.ch/supdata/Roux_ 
positive_selection_ants/families.html (last accessed April 24, 
2014). Jalview tracks display the regions used or filtered out in 
the original protein alignments, as well as the residues found 
to be under positive selection by Bayes Empirical Bayes (Yang 
et al. 2005) in all the branches tested for each of the three 
replicate runs (fig. 2). 

Supplementary Material 

Supplementary tables S1-S26, figures S1-S7, and supplemen- 
tary text are available at Molecular Biology and Evolution 
online (http://www.mbe.oxfordjournals.org/). 
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